138 research outputs found
To Share or Not to Share in Client-Side Encrypted Clouds
With the advent of cloud computing, a number of cloud providers have arisen
to provide Storage-as-a-Service (SaaS) offerings to both regular consumers and
business organizations. SaaS (different than Software-as-a-Service in this
context) refers to an architectural model in which a cloud provider provides
digital storage on their own infrastructure. Three models exist amongst SaaS
providers for protecting the confidentiality data stored in the cloud: 1) no
encryption (data is stored in plain text), 2) server-side encryption (data is
encrypted once uploaded), and 3) client-side encryption (data is encrypted
prior to upload). This paper seeks to identify weaknesses in the third model,
as it claims to offer 100% user data confidentiality throughout all data
transactions (e.g., upload, download, sharing) through a combination of Network
Traffic Analysis, Source Code Decompilation, and Source Code Disassembly. The
weaknesses we uncovered primarily center around the fact that the cloud
providers we evaluated were each operating in a Certificate Authority capacity
to facilitate data sharing. In this capacity, they assume the role of both
certificate issuer and certificate authorizer as denoted in a Public-Key
Infrastructure (PKI) scheme - which gives them the ability to view user data
contradicting their claims of 100% data confidentiality. We have collated our
analysis and findings in this paper and explore some potential solutions to
address these weaknesses in these sharing methods. The solutions proposed are a
combination of best practices associated with the use of PKI and other
cryptographic primitives generally accepted for protecting the confidentiality
of shared information
Interpretable Probabilistic Password Strength Meters via Deep Learning
Probabilistic password strength meters have been proved to be the most
accurate tools to measure password strength. Unfortunately, by construction,
they are limited to solely produce an opaque security estimation that fails to
fully support the user during the password composition. In the present work, we
move the first steps towards cracking the intelligibility barrier of this
compelling class of meters. We show that probabilistic password meters
inherently own the capability of describing the latent relation occurring
between password strength and password structure. In our approach, the security
contribution of each character composing a password is disentangled and used to
provide explicit fine-grained feedback for the user. Furthermore, unlike
existing heuristic constructions, our method is free from any human bias, and,
more importantly, its feedback has a clear probabilistic interpretation. In our
contribution: (1) we formulate the theoretical foundations of interpretable
probabilistic password strength meters; (2) we describe how they can be
implemented via an efficient and lightweight deep learning framework suitable
for client-side operability.Comment: An abridged version of this paper appears in the proceedings of the
25th European Symposium on Research in Computer Security (ESORICS) 202
Deep Models Under the GAN: Information Leakage from Collaborative Deep Learning
Deep Learning has recently become hugely popular in machine learning,
providing significant improvements in classification accuracy in the presence
of highly-structured and large databases.
Researchers have also considered privacy implications of deep learning.
Models are typically trained in a centralized manner with all the data being
processed by the same training algorithm. If the data is a collection of users'
private data, including habits, personal pictures, geographical positions,
interests, and more, the centralized server will have access to sensitive
information that could potentially be mishandled. To tackle this problem,
collaborative deep learning models have recently been proposed where parties
locally train their deep learning structures and only share a subset of the
parameters in the attempt to keep their respective training sets private.
Parameters can also be obfuscated via differential privacy (DP) to make
information extraction even more challenging, as proposed by Shokri and
Shmatikov at CCS'15.
Unfortunately, we show that any privacy-preserving collaborative deep
learning is susceptible to a powerful attack that we devise in this paper. In
particular, we show that a distributed, federated, or decentralized deep
learning approach is fundamentally broken and does not protect the training
sets of honest participants. The attack we developed exploits the real-time
nature of the learning process that allows the adversary to train a Generative
Adversarial Network (GAN) that generates prototypical samples of the targeted
training set that was meant to be private (the samples generated by the GAN are
intended to come from the same distribution as the training data).
Interestingly, we show that record-level DP applied to the shared parameters of
the model, as suggested in previous work, is ineffective (i.e., record-level DP
is not designed to address our attack).Comment: ACM CCS'17, 16 pages, 18 figure
Entangled cloud storage
Entangled cloud storage (Aspnes et al., ESORICS 2004) enables a set of clients to “entangle” their files into a single clew to be stored by a (potentially malicious) cloud provider. The entanglement makes it impossible to modify or delete significant part of the clew without affecting all files encoded in the clew. A clew keeps the files in it private but still lets each client recover his own data by interacting with the cloud provider; no cooperation from other clients is needed. At the same time, the cloud provider is discouraged from altering or overwriting any significant part of the clew as this will imply that none of the clients can recover their files. We put forward the first simulation-based security definition for entangled cloud storage, in the framework of universal composability (Canetti, 2001). We then construct a protocol satisfying our security definition, relying on an entangled encoding scheme based on privacy-preserving polynomial interpolation; entangled encodings were originally proposed by Aspnes et al. as useful tools for the purpose of data entanglement. As a contribution of independent interest we revisit the security notions for entangled encodings, putting forward stronger definitions than previous work (that for instance did not consider collusion between clients and the cloud provider). Protocols for entangled cloud storage find application in the cloud setting, where clients store their files on a remote server and need to be ensured that the cloud provider will not modify or delete their data illegitimately. Current solutions, e.g., based on Provable Data Possession and Proof of Retrievability, require the server to be challenged regularly to provide evidence that the clients’ files are stored at a given time. Entangled cloud storage provides an alternative approach where any single client operates implicitly on behalf of all others, i.e., as long as one client's files are intact, the entire remote database continues to be safe and unblemishe
Recent advances in security and privacy in big data
Big data has become an important topic in science, engineering, medicine, healthcare, finance, business and ultimately society itself. Big data refers to the massive amount of digital information stored or transmitted in computer systems. Approximately, 2.5 quintillion bytes of data are created every day. Almost 90% of data in the world today are created in the last two years alone. Security and privacy issues becomes more critical due to large volumes and variety, due to data hosted in large-scale cloud infrastructures, diversity of data sources and formats, streaming nature of data acquisition and high volume inter-cloud migration. In large-scale cloud infrastructures, a diversity of software platforms provides more opportunities to attackers. Traditional security mechanisms, which are usually invented for securing small-scale data, are inadequate. With a rapid growth of big data applications, it has become critical to introduce new security technology to accommodate the need of big data applications. The objective of this special issue is to capture the latest advances in this research field
Universal Neural-Cracking-Machines: Self-Configurable Password Models from Auxiliary Data
We develop the first universal password model -- a password model that, once
pre-trained, can automatically adapt to any password distribution. To achieve
this result, the model does not need to access any plaintext passwords from the
target set. Instead, it exploits users' auxiliary information, such as email
addresses, as a proxy signal to predict the underlying target password
distribution. The model uses deep learning to capture the correlation between
the auxiliary data of a group of users (e.g., users of a web application) and
their passwords. It then exploits those patterns to create a tailored password
model for the target community at inference time. No further training steps,
targeted data collection, or prior knowledge of the community's password
distribution is required. Besides defining a new state-of-the-art for password
strength estimation, our model enables any end-user (e.g., system
administrators) to autonomously generate tailored password models for their
systems without the often unworkable requirement of collecting suitable
training data and fitting the underlying password model. Ultimately, our
framework enables the democratization of well-calibrated password models to the
community, addressing a major challenge in the deployment of password security
solutions on a large scale.Comment: v0.0
No Place to Hide that Bytes won't Reveal: Sniffing Location-Based Encrypted Traffic to Track a User's Position
News reports of the last few years indicated that several intelligence
agencies are able to monitor large networks or entire portions of the Internet
backbone. Such a powerful adversary has only recently been considered by the
academic literature. In this paper, we propose a new adversary model for
Location Based Services (LBSs). The model takes into account an unauthorized
third party, different from the LBS provider itself, that wants to infer the
location and monitor the movements of a LBS user. We show that such an
adversary can extrapolate the position of a target user by just analyzing the
size and the timing of the encrypted traffic exchanged between that user and
the LBS provider. We performed a thorough analysis of a widely deployed
location based app that comes pre-installed with many Android devices:
GoogleNow. The results are encouraging and highlight the importance of devising
more effective countermeasures against powerful adversaries to preserve the
privacy of LBS users.Comment: 14 pages, 9th International Conference on Network and System Security
(NSS 2015
No NAT'd User left Behind: Fingerprinting Users behind NAT from NetFlow Records alone
It is generally recognized that the traffic generated by an individual
connected to a network acts as his biometric signature. Several tools exploit
this fact to fingerprint and monitor users. Often, though, these tools assume
to access the entire traffic, including IP addresses and payloads. This is not
feasible on the grounds that both performance and privacy would be negatively
affected. In reality, most ISPs convert user traffic into NetFlow records for a
concise representation that does not include, for instance, any payloads. More
importantly, large and distributed networks are usually NAT'd, thus a few IP
addresses may be associated to thousands of users. We devised a new
fingerprinting framework that overcomes these hurdles. Our system is able to
analyze a huge amount of network traffic represented as NetFlows, with the
intent to track people. It does so by accurately inferring when users are
connected to the network and which IP addresses they are using, even though
thousands of users are hidden behind NAT. Our prototype implementation was
deployed and tested within an existing large metropolitan WiFi network serving
about 200,000 users, with an average load of more than 1,000 users
simultaneously connected behind 2 NAT'd IP addresses only. Our solution turned
out to be very effective, with an accuracy greater than 90%. We also devised
new tools and refined existing ones that may be applied to other contexts
related to NetFlow analysis
- …